Deliver synchronous fault signals to the faulting thread#86
Conversation
test-fork's phase-2 signal child spins in `while (!got_usr1) usleep()` waiting for a SIGUSR1 sent cross-process by its parent. The signal was delivered only ~35% of the time and lost the rest, so `make check` hung at test-fork until the 60s per-test timeout -- often longer, as leaked `elfuse --fork-child` orphans kept the driver's stdout pipe open. Two complementary defects: 1. vcpu_run_loop treated HV_EXIT_REASON_UNKNOWN as fatal. A host SIGUSR2 (the cross-process guest-signal transport) that interrupts hv_vcpu_run mid-execution aborts the run with UNKNOWN rather than the clean CANCELED that hv_vcpus_exit() produces for a vCPU caught between runs. Route UNKNOWN through the same cancellation handling so the already-queued guest signal is delivered instead of crashing the child. 2. signal_deliver redirected to the handler only via ELR_EL1, which takes effect solely on an ERET from EL1 (the syscall-return path, gated by the shim's X8==2 exec_drop_frame marker). When the signal is delivered from the cancellation branch -- i.e. the vCPU was preempted while running EL0 code (cross-process SIGUSR2, or SIGALRM in a tight loop) -- there is no pending ERET, the resume uses HV_REG_PC, and the ELR_EL1 write is a no-op: the handler never runs and only the X0=signum clobber lands, re-running the interrupted nanosleep with a bogus arg and spinning forever. Detect EL0 preemption from the live PSTATE (CPSR M[3:0]==0), save the interrupted PC from HV_REG_PC instead of the stale ELR_EL1, and redirect HV_REG_PC/CPSR directly; skip the X8==2 marker since there is no shim frame to drop. test-fork now passes 20/20 (was ~7/20); `make check` is green with no hang.
A guest exception (SIGSEGV/SIGBUS/SIGILL/SIGFPE/SIGTRAP) was delivered by calling signal_set_fault_info() (which records si_code/si_addr in a _Thread_local slot) followed by signal_queue() + signal_deliver(). But signal_queue() sets a bit in the process-wide pending mask, and every vCPU thread runs signal_deliver() at its own poll points. So a fault raised by thread A could be dequeued and delivered by thread B, whose thread-local fault info is empty -- the siginfo then degrades to si_code=SI_USER with no si_addr instead of SEGV_MAPERR/si_addr. A JVM treats a SIGSEGV with si_code=SI_USER as a fatal external signal rather than a recoverable implicit null check, so javac (heavily multi-threaded, many null-check faults) crashed with a fatal error report. Two threads faulting on the same signal also collapsed into a single bitmask bit, dropping one fault entirely. Add signal_deliver_fault(), which delivers a given signal directly to the current thread using its thread-local fault info and never touches the process-wide pending set. signal_deliver()'s frame-building tail is factored into deliver_signal_locked() and shared. Route all four synchronous-fault sites in the vCPU loop (SEGV_ACCERR, BRK->SIGTRAP, SIGILL, data-abort SIGSEGV) through it. Add test-fault-signal-mt: N threads each take recoverable SIGSEGVs in a loop and assert every delivery is SEGV_MAPERR with the right si_addr on the faulting thread. It fails deterministically (rc=2, fault delivered to a non-faulting thread) on the old queue-based path and passes on the fix. (cherry picked from commit 201b4fa0d2ffb2218e252be71930e1cfb9e360a5)
| */ | ||
| pthread_mutex_lock(&sig_lock); | ||
| signal_rt_info_t rt_info = signal_default_info(signum); | ||
| return deliver_signal_locked(vcpu, g, signum, rt_info, exit_code); |
There was a problem hiding this comment.
The comment two lines up calls this "intentionally ignored: a synchronous fault cannot be postponed" -- that's only half the Linux contract. force_sig_info_to_task in kernel/signal.c resets the disposition to SIG_DFL and unblocks the signum before applying default, so a SIGSEGV/SIGBUS/SIGILL/SIGFPE/SIGTRAP that's blocked OR set to SIG_IGN terminates the process. This PR's behavior diverges in two ways:
- SIG_IGN on synchronous fault (preexisting):
deliver_signal_lockedat signal.c:1352 returns 0 for SIG_IGN. The vCPU loop resumes, PC unchanged, the same instruction re-faults forever. Linux would terminate. - Blocked synchronous fault (new in this PR): pre-PR,
signal_deliver'sdeliverable = sig_state.pending & ~*blockedshort-circuited and the thread re-faulted infinitely. Post-PR,signal_deliver_faultbypasses the blocked mask entirely and runs the user handler despite the block. Linux would reset to SIG_DFL + unblock + terminate.
JVM works either way because it never blocks SIGSEGV, but the contract drift is real. A five-line precheck closes both cases with one shape:
int signal_deliver_fault(hv_vcpu_t vcpu, guest_t *g, int signum, int *exit_code)
{
pthread_mutex_lock(&sig_lock);
uint64_t *blocked = thread_blocked_ptr();
linux_sigaction_t *act = &sig_state.actions[signum - 1];
if (act->sa_handler == LINUX_SIG_IGN ||
(*blocked & sig_bit(signum))) {
/* Linux force_sig_info_to_task: forced synchronous faults reset
* disposition to SIG_DFL, unblock the signum, then apply default. */
act->sa_handler = LINUX_SIG_DFL;
*blocked &= ~sig_bit(signum);
}
signal_rt_info_t rt_info = signal_default_info(signum);
return deliver_signal_locked(vcpu, g, signum, rt_info, exit_code);
}Update the "intentionally ignored" comment accordingly.
jserv
left a comment
There was a problem hiding this comment.
Rebase onto the latest main branch, resolve any merge conflicts, and refine the changes based on the review feedback.
There was a problem hiding this comment.
1 issue found across 6 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/syscall/proc.c">
<violation number="1" location="src/syscall/proc.c:1948">
P1: When UNKNOWN exits are treated as preemption, rseq abort can read a stale PC and fail to redirect to `abort_ip`, leaving a preempted rseq critical section to resume incorrectly. Consider syncing ELR/SPSR from `HV_REG_PC/HV_REG_CPSR` (or using `HV_REG_PC` directly) on UNKNOWN before the `rseq_try_abort()` path.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| } | ||
| } else if (vexit->reason == HV_EXIT_REASON_CANCELED) { | ||
| } else if (vexit->reason == HV_EXIT_REASON_CANCELED || | ||
| vexit->reason == HV_EXIT_REASON_UNKNOWN) { |
There was a problem hiding this comment.
P1: When UNKNOWN exits are treated as preemption, rseq abort can read a stale PC and fail to redirect to abort_ip, leaving a preempted rseq critical section to resume incorrectly. Consider syncing ELR/SPSR from HV_REG_PC/HV_REG_CPSR (or using HV_REG_PC directly) on UNKNOWN before the rseq_try_abort() path.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/syscall/proc.c, line 1948:
<comment>When UNKNOWN exits are treated as preemption, rseq abort can read a stale PC and fail to redirect to `abort_ip`, leaving a preempted rseq critical section to resume incorrectly. Consider syncing ELR/SPSR from `HV_REG_PC/HV_REG_CPSR` (or using `HV_REG_PC` directly) on UNKNOWN before the `rseq_try_abort()` path.</comment>
<file context>
@@ -1941,11 +1944,22 @@ int vcpu_run_loop(hv_vcpu_t vcpu,
}
- } else if (vexit->reason == HV_EXIT_REASON_CANCELED) {
+ } else if (vexit->reason == HV_EXIT_REASON_CANCELED ||
+ vexit->reason == HV_EXIT_REASON_UNKNOWN) {
/* Canceled by hv_vcpus_exit(). Can be: alarm timeout,
* exit_group from another thread, or signal preemption
</file context>
Note
Stacked on #76. This branch is based on
fix-cross-process-signal-el0(#76),whose commit appears here until #76 merges; afterwards this PR auto-reduces to the
single fault-delivery commit. Review #76 first.
A guest exception (SIGSEGV/SIGBUS/SIGILL/SIGFPE/SIGTRAP) was delivered via
signal_set_fault_info()(which records si_code/si_addr in a_Thread_localslot)followed by
signal_queue()+signal_deliver(). Butsignal_queue()sets a bit inthe process-wide pending mask, and every vCPU thread runs
signal_deliver()at its ownpoll points, so a fault raised by thread A could be dequeued and delivered by thread B —
whose thread-local fault info is empty. The siginfo then degraded to
si_code=SI_USERwith no
si_addrinstead ofSEGV_MAPERR/si_addr. A JVM treats such a SIGSEGV as afatal external signal rather than a recoverable implicit null check, so
javaccrashed.Two threads faulting on the same signal also collapsed into one bitmask bit, dropping
one fault.
Add
signal_deliver_fault(), which delivers a given signal directly to the currentthread using its thread-local fault info and never touches the process-wide pending set.
signal_deliver()'s frame-building tail is factored intodeliver_signal_locked()andshared. All four synchronous-fault sites in the vCPU loop (SEGV_ACCERR, BRK→SIGTRAP,
SIGILL, data-abort SIGSEGV) route through it.
Adds
test-fault-signal-mt: N threads each take recoverable SIGSEGVs and assert everydelivery is
SEGV_MAPERRwith the rightsi_addron the faulting thread. It failsdeterministically (rc=2) on the old queue-based path and passes on the fix.
Summary by cubic
Deliver synchronous fault signals to the faulting vCPU thread to keep correct
si_code/si_addrand avoid lost faults. Fixes JVM crashes caused bySIGSEGVdelivered asSI_USERand makes multi-threaded faults reliable.signal_deliver_fault()and routed all synchronous guest exceptions through it; bypasses the process-wide pending mask.deliver_signal_locked(); supports EL0 preemption by redirecting viaHV_REG_PC/CPSR.vcpu_run_loopto use the new path and to treatHV_EXIT_REASON_UNKNOWNlikeCANCELEDso preempted runs resume cleanly.test-fault-signal-mtto verify per-threadSIGSEGVdelivery; Makefile and manifest updated.Written for commit 23b2307. Summary will update on new commits.